Information Extraction Using Metadata andSolving Polysemy Problems
نویسنده
چکیده
Data mining is the exploration and evaluation of large quantity of data to discover substantial, novel, useful and effectively understandable data. Hence determining the knowledge of a document becomes a necessary task in data mining. There are three approaches of metadata in general. They are stylistic, machine learning and knowledge bases. Sometimes the problem occurs when mining a document that contains polysemic words which leads to irrelevant extraction and increased processing time. Polysemy refers to coexistence of many possible meaning for a word or phrase. In order to extract exact information, polysemy like issue should be solved. This work uses knowledge based metadata to extract information using Domain-based Information Extraction technique (DIE). Hence this work targets in solving polysemy which can increases the accuracy of information extraction and reduce processing time. By applying this method to a enormous amount of Engineering domains contains fields like computer science, biomedical, nanotechnology, physics, this work shows that the information extraction is efficient for day-to-day applications with reduced processing time. Keyword-Data mining, Information extraction, Metadata, Polysemy, Domain-based extraction.
منابع مشابه
Automatic Biomedical Term Polysemy Detection
Polysemy is the capacity for a word to have multiple meanings. Polysemy detection is a first step for Word Sense Induction (WSI), which allows to find different meanings for a term. The polysemy detection is also important for information extraction (IE) systems. In addition, the polysemy detection is important for building/enriching terminologies and ontologies. In this paper, we present a nov...
متن کاملUnsupervised Metadata Extraction in Scientific Digital Libraries Using A-Priori Domain-Specific Knowledge
Information extraction from unstructured sources is a crucial step in the semantic annotation of content. The challenge is in supporting an high quality automatic approach (or at least semi-automatic) in order to sustain the scalability of the semantic-enabled services of the future. Unsupervised information extraction encompasses a number of underlying research problems, such as natural langua...
متن کاملTowards Large-Scale Unsupervised Relation Extraction from the Web
The Web brings an open-ended set of semantic relations. Discovering the significant types is very challenging. Unsupervised algorithms have been developed to extract relations from a corpus without knowing the relation types in advance, but most of them rely on tagging arguments of predefined types. One recently reported system is able to jointly extract relations and their argument semantic cl...
متن کاملA New Framework for Unsupervised Semantic Discovery
This paper presents a new framework for the unsupervised discovery of semantic information, using a divide-and-conquer approach to take advantage of contextual regularities and to avoid problems of polysemy and sublanguages. Multiple sets of documents are formed and analyzed to create multiple sets of frames. The overall procedure is wholly unsupervised and domain independent. The end result wi...
متن کاملData and Methods for the Production of National Population Estimates: An Overview and Analysis of Available Metadata
Thomas Spoorenberg Translated by: Elham Fathi Statistical Center of Iran Abstract. Official population estimates can be produced using a variety of data sources and methods. These range from the direct extraction of information from continuously updated population registers to procedures for updating the status of a population enumerated previously in a periodic census. Additional sources and ...
متن کامل